Introduction:

This task involves the analysis of crime and climate data for Colchester in 2024–25 to determine any correlation between climatic patterns and street-level crime occurrences. The crime2024-25.csv data set provides details on individual crime occurrences by type, location, and date to provide a complete picture of criminal activity. Aiding this, two climate data temp2024-25.csv and temp2023-24.csv—contain daily weather information from a local station, which gives details on temperature, humidity, precipitation, and other meteorological parameters. With the integration of crime and weather information, this analysis seeks to determine if weather factors have a correlation with variations in the frequency or types of crime. These findings may be beneficial in informing the local law enforcement patterns and resource deployment. Also, a comparison of climatic data of 2024–25 with the last year will allow one to identify any major climatic anomalies that may have dictated the patterns of crime. This cross disciplinary integration of criminology with environmental data science works for applied public safety research.

Dataset Explanation:

The crime2024-25.csv dataset contains detailed street-level crime information for Colchester covering the April 2024 to March 2025 timeframe, with 6,047 records. Each record is a unique crime incident, as coded by an ID and possibly a persistent ID. Key variables include the nature of the crime (e.g., anti-social behaviour, violent crime), date, geographical coordinates (latitude and longitude), and street-level location at which the incident took place. There are also columns that contain contextual data such as location type, road name, and outcome status (e.g., “Under investigation” or “Unable to prosecute suspect”). Some entries have missing data on fields such as persistent ID and outcome. The information comes from UK Police data and is intended to facilitate spatial and temporal examination of crime trends for evaluating patterns and responses to local police activity.

Data Preprocessing:

Preprocessing of data guarantees that the crime dataset is clean and ready for analysis. The str(df) command checks initially the data frame structure to confirm data types. colSums(is.na(df)) identifies missing values across all columns. The date column, previously in string format, is now read to a proper date format using ym() so accurate time-based analysis is possible. Categorical variables—category, location_type, and outcome_status—are converted to factors so as to facilitate statistical modeling and plotting.

For missing values in outcome_status, the factor is first converted to character type in order to allow for safe replacement. This is then succeeded by the replace_na() function, where missing values are replaced by “Unknown” for uniformity and to avoid analysis complications. All these combined prepare the dataset for the next phase of tasks such as summarization, visualization, and modeling, while preserving data integrity and making the dataset compatible with R data-handling functions.

## 'data.frame':    6047 obs. of  13 variables:
##  $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ category        : chr  "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" ...
##  $ persistent_id   : chr  "" "" "" "" ...
##  $ date            : chr  "2024-04" "2024-04" "2024-04" "2024-04" ...
##  $ lat             : num  51.9 51.9 51.9 51.9 51.9 ...
##  $ long            : num  0.896 0.904 0.895 0.921 0.898 ...
##  $ street_id       : int  2153038 2153245 2153000 2153730 2153077 2153077 2153426 2153593 2153012 2153237 ...
##  $ street_name     : chr  "On or near North Hill" "On or near Bus/coach Station" "On or near Church Street" "On or near Tarrett Drive" ...
##  $ context         : logi  NA NA NA NA NA NA ...
##  $ id              : int  118021898 118022736 118022480 118022387 118022363 118022329 118022316 118022288 118022276 118022270 ...
##  $ location_type   : chr  "Force" "Force" "Force" "Force" ...
##  $ location_subtype: chr  "" "" "" "" ...
##  $ outcome_status  : chr  NA NA NA NA ...
##                X         category    persistent_id             date 
##                0                0                0                0 
##              lat             long        street_id      street_name 
##                0                0                0                0 
##          context               id    location_type location_subtype 
##             6047                0                0                0 
##   outcome_status 
##              668
Table - Frequency of Crime Categories
Crime Category Frequency
anti-social-behaviour 668
bicycle-theft 151
burglary 157
criminal-damage-arson 466
drugs 231
other-crime 91
other-theft 399
possession-of-weapons 58
public-order 451
robbery 81
shoplifting 643
theft-from-the-person 84
vehicle-crime 253
violent-crime 2314

Table 1: Crime Category Frequencies: This frequency table provides the categories and their frequencies of reported offenses by the crime category. Violent crime (2,314) is the most frequent, followed by anti-social behaviour (668) and shoplifting (643). Possession of weapons and robbery are the least frequent. It provides prominent crime types in Colchester for 2024–25.

Table - Frequency of Outcome Status
Outcome Status Frequency
Action to be taken by another organisation 114
Awaiting court outcome 314
Court result unavailable 218
Formal action is not in the public interest 51
Further action is not in the public interest 4
Further investigation is not in the public interest 2
Investigation complete; no suspect identified 2002
Local resolution 174
Offender given a caution 52
Status update unavailable 235
Suspect charged as part of another case 1
Unable to prosecute suspect 1793
Under investigation 419
Unknown 668

Table 2: Outcome Status Frequencies: This shows what became of each offence. Most common is “Investigation complete; no suspect identified” (2,002 cases), implying problems in identifying culprits. “Unable to prosecute suspect” (1,793) is also common. “Unknown” (668) accounts for missing data. Formal or court-based outcomes happened in only a small minority.

Table - Location Type vs. Outcome Status
LocationType Action to be taken by another organisation Awaiting court outcome Court result unavailable Formal action is not in the public interest Further action is not in the public interest Further investigation is not in the public interest Investigation complete; no suspect identified Local resolution Offender given a caution Status update unavailable Suspect charged as part of another case Unable to prosecute suspect Under investigation Unknown
BTP BTP 0 0 0 0 0 0 0 0 0 14 0 0 4 0
Force Force 114 314 218 51 4 2 2002 174 52 221 1 1793 415 668

Table 3: Outcome by Police Force: This table divides outcomes by police force. All substantial outcomes (e.g., prosecutions, investigations, cautions) occurred under Force. BTP added only a few records, with few or no cases closed, corroborating that Force processed the lion’s share of crime and follow-up actions.

Table - Crime Category vs. Location Type
CrimeCategory BTP Force
anti-social-behaviour anti-social-behaviour 0 668
bicycle-theft bicycle-theft 1 150
burglary burglary 0 157
criminal-damage-arson criminal-damage-arson 1 465
drugs drugs 1 230
other-crime other-crime 0 91
other-theft other-theft 5 394
possession-of-weapons possession-of-weapons 0 58
public-order public-order 3 448
robbery robbery 1 80
shoplifting shoplifting 0 643
theft-from-the-person theft-from-the-person 2 82
vehicle-crime vehicle-crime 0 253
violent-crime violent-crime 4 2310

Table 4: Crime Type by Police Force: This is a comparison of categories of BTP crime and Force. BTP handled relatively few crimes (e.g., 1 bike theft, 4 violent crimes), while Force handled most of them, including all anti-social behaviour, burglary, and shoplifting. This is the same as general street crime in Colchester being within local police jurisdiction.

##Graph 1: This graph shows monthly number of cases between May 2024 and March 2025. Starting from a number near 550 cases in May 2024, the number steadily decreases every month afterward. In March 2025, the number falls to near 400 cases, which reflects a steep downward trend for these 10 months.

##Graph 2: The graphs plotted shows the top 10 crimes categories from the dataset. From here it can be interpreted that violent crime, antisocial behavior and shoplifting are the 3 major crimes occoured in the city. The violent crimes has occoured almost 2000 times where as bicycle theft and burglary are the least occouring crimes

##Graph 3: This bar chart indicates the case outcome status distribution. The most frequent of these are “Investigation complete; no suspect identified” and “Unable to prosecute suspect,” both of which have figures of about 500,000. “Under investigation” and “Status update unavailable” are also frequent. Some of the outcomes indicate no action taken further.

##                                street_name   n
## 1                 On or near Shopping Area 498
## 2                   On or near Supermarket 488
## 3                     On or near Nightclub 196
## 4                 On or near George Street 147
## 5  On or near Conference/exhibition Centre 138
## 6                  On or near Parking Area 137
## 7            On or near Culver Street West 136
## 8                On or near Police Station 135
## 9            On or near St Nicholas Street 112
## 10               On or near Cowdray Avenue 108

##Graph 4: This pie chart ranks the top five most common crime types in decreasing order of frequency, led by the greatest category of anti-social behaviour. Next is criminal damage and arson, followed by public order offence. Then violent crime and shoplifting fill out the top five as the most reported types of crime.

##Graph 5: This histogram displays the frequency distribution of latitude measures, from 51.875 to 51.901. The y-axis is labeled with counts (0–750), and peaks show regions of high density. The tallest bar (presumably around 51.890–51.900) suggests a cluster of data points, convenient for spotting geographic clustering in datasets like crime occurrences, weather stations, or population clusters.

##Graph 6: The violin plot depicts how the spread of the values of latitude varies between outcome statuses. All outcomes, apart from “Investigation complete; no suspect identified,” have a close spread around the value of latitude 51.885, whereas it has a highly close and greater distribution.

##Graph 7: This scatter plot illustrates the correlation between longitude (x-axis, 0.88–0.92) and latitude (y-axis, 51.875–51.905) for different crime types in one specific area. Every point is a crime occurrence, color-coded by type (for example, theft, violence). The graph assists in demonstrating crime distribution geographically, with groups revealing areas of high crime concentration.

##Graph 8: This correlation heatmap illustrates the relationship of latitude (lat) and longitude (long) with other variables. The correlations vary from -1 to -0.13, indicating weak to strong negative relationship. Stronger negative association is indicated by denser color, hence the related variables decrease as lat/long value increases.

##Graph 9: This box plot shows the distribution of latitudes (y-axis, 51.875–51.905) across various types of crimes (x-axis). Box plots for every crime type (e.g., drugs, theft) show median, quartiles, and outliers. Differences in latitude ranges suggest geospatial clusters of crimes—different types of crimes might be more frequent at specific locations.

##Graph 10: This script identifies the 20 most criminally-contaminated streets in Colchester by summarizing and aggregating the data set. It continues to use the leaflet package to create an interactive map on which these streets are plotted with circle markers. Marker size and color saturation have been used to indicate the number of crimes. The script includes a legend for clarity, allowing users to visually estimate areas of high crime by location, making it easier for spatial analysis of crimes.

Dataset Explanation

Temperature data consist of 366 daily weather readings from a station in the Colchester area (station_ID 3590), from April 1, 2023, to March 31, 2024. Every row is a weather reading on a specific date. Key variables are mean, maximum, and minimum temperature (°C), dew point temperature (TdAvgC), mean humidity (HrAvg%), wind speed and direction, station and sea-level air pressure (PresslevHp, PreselevHp), and rainfall (Precmm). Other variables capture cloud cover (TotClOct, lowClOct), sunshine duration (SunD1h), visibility (VisKm), and snow depth (SnowDepcm). Missing values are present in a few columns, notably snow depth and context-related measures. This dataset offers correlation analysis of weather conditions with other variables—e.g., the pattern of daily crimes—on contextual insights into the environment surrounding local crimes.

Table 6 - Two-way Table: Wind Direction vs. Precipitation Presence
Wind Direction No Precipitation Precipitation
E E 9 2
ENE ENE 18 3
ESE ESE 8 5
N N 6 4
NE NE 17 4
NNE NNE 7 4
NNW NNW 6 6
NW NW 8 4
S S 11 14
SE SE 3 6
SSE SSE 5 10
SSW SSW 14 22
SW SW 21 26
W W 13 12
WNW WNW 11 6
WSW WSW 24 27

Preprocessing and Table 1:

Preprocessing begins with the conversion of Date column in the weather data to proper Date format using as.Date() to ensure the accuracy of time-dependent operations. A two-way frequency table is subsequently extracted to examine the relationship between the wind direction (WindkmhDir) and the occurrence of precipitation (Precmm > 0). The > operator creates a logical statement where TRUE illustrates days of precipitation and FALSE illustrates dry days. This table is converted to a data frame, in nice form for visualization, by the kable() function for tidy tabular presentation.

The produced table shows precipitation patterns based on different wind directions. For instance, WSW (West-Southwest) registered the highest number of rainy days (27), followed by SW (Southwest) with 26 and SSW (South-Southwest) with 22. Meanwhile, easterly winds like E and ENE had fewer precipitation days. These results mean that precipitation is most probable in Colchester when wind comes from southern and western directions, offering valuable knowledge on weather-crime pattern relations.

##Graph 1: This bar chart displays the number of days by wind direction (for example, N, S, E, W, and intermediates like ENE, SSW). Compass directions are shown along the x-axis, and the not-visible y-axis likely measures counts. It helps to identify prevailing winds, where taller bars denote more frequent wind directions in the data set.

##Graph 2: The graph shows the histogram of average temperature and from here it can be interpreted that temperature mostly lies in the range of 5 to 10 degree centigrade. Also there are some days where temperature dropped below the 0 degree that mostly happened during winter. There are also some days where temperature has rose above 20 degree

##Graph 3: Boxplot to illustrate comparison of mean temperatures (°C) across wind direction (e.g., N, S, E, W). Temperature ranges (10–20°C) are on the y-axis, and wind directions (partially shown) on the x-axis. Boxes represent median, quartiles, and outliers and display how temperature distributions vary by wind direction—easy to detect climate trends or weather patterns ##Graph 4: This scatter plot investigates the relationship between atmospheric pressure levels (x-axis, hPa) and average temperature (y-axis, °C). Temperature range is unusually broad (5–1200°C), which may be due to mistakes or log scaling. Pressure levels (up to 1040 hPa) may either be surface or high-altitude measurements. The graph would likely indicate patterns such as temperature with height or weather.

Table: Pearson Correlation Matrix of Selected Weather Variables
Variable TemperatureCAvg TemperatureCMax TemperatureCMin PresslevHp Precmm
TemperatureCAvg TemperatureCAvg 1.00 0.98 0.95 0.09 0.03
TemperatureCMax TemperatureCMax 0.98 1.00 0.89 0.12 -0.02
TemperatureCMin TemperatureCMin 0.95 0.89 1.00 0.02 0.09
PresslevHp PresslevHp 0.09 0.12 0.02 1.00 -0.42
Precmm Precmm 0.03 -0.02 0.09 -0.42 1.00

Correlation Matrix

This correlation matrix shows strong positive correlations between temperature observations, especially between mean and maximum temperature (r = 0.98). Precipitation (Precmm) is moderately negatively correlated with pressure (r = -0.42), suggesting rain falls on low-pressure days. Other precipitation correlations are weak or zero.

##Graph 5: This time series plot tracks average temperature (°C) from April 2024 through April 2023, with LOESS smoothing (trend line) highlighting seasonal patterns. Peaks are presumably summer months (e.g., Aug 2023), and troughs represent the cold periods. The smoothed line helps the eye visualize long-term trends, i.e., cycles of warming or cooling, amidst day-to-day fluctuations.

##Graph 6: This interactive time series graph plots average temperature trends (°C) from April 2023 to April 2024, from 5°C to 20°C. The user can supposedly zoom in or hover for information. The graph shows seasonal movement—warmer highs (e.g., summer 2023) and colder lows (e.g., winter 2024)—enabling the analysis of climate trends dynamically. Missing data points show an incomplete rendering.

Dataset Explanation:

The 2024–25 climate data set contains 365 daily values from a Colchester-region weather station (station_ID 3590), starting on April 1, 2024, and ending on March 31, 2025. Each row captures a snapshot of some weather measurements on a given day. All important variables are mean, highest and lowest temperatures (°C), dew point (TdAvgC), humidity (HrAvg%), wind direction and velocity, sea-level pressure (PresslevHp), rainfall (Precmm), cloud cover (TotClOct, lowClOct), duration of sunshine (SunD1h), visibility (VisKm), and depth of snow (SnowDepcm) with some missing values present mainly in snow fields. The details help determine seasonal weather patterns, extreme events, or anomalies. It would also be useful for correlation with other data sets—e.g., crime datasets—towards exploring possible climate-related impacts on human behavior within Colchester.

Dataset Explanation:

The 2024–25 climate data set contains 365 daily values from a Colchester-region weather station (station_ID 3590), starting on April 1, 2024, and ending on March 31, 2025. Each row captures a snapshot of some weather measurements on a given day. All important variables are mean, highest and lowest temperatures (°C), dew point (TdAvgC), humidity (HrAvg%), wind direction and velocity, sea-level pressure (PresslevHp), rainfall (Precmm), cloud cover (TotClOct, lowClOct), duration of sunshine (SunD1h), visibility (VisKm), and depth of snow (SnowDepcm) with some missing values present mainly in snow fields. The details help determine seasonal weather patterns, extreme events, or anomalies. It would also be useful for correlation with other data sets—e.g., crime datasets—towards exploring possible climate-related impacts on human behavior within Colchester.

Table - Two-way Table: Wind Direction vs. Precipitation Presence (2024–2025)
Wind Direction No Precipitation Precipitation
E E 13 4
ENE ENE 6 5
ESE ESE 5 6
N N 6 8
NE NE 11 8
NNE NNE 7 4
NNW NNW 2 15
NW NW 10 11
S S 8 9
SE SE 8 6
SSE SSE 10 11
SSW SSW 11 21
SW SW 18 23
W W 18 16
WNW WNW 12 13
WSW WSW 22 15

Dataset Description and Table

The data set is first cleaned by converting the Date column to a suitable Date type using as.Date() for effective time filtering. It is then filtered for keeping only records for years 2024 and 2025 using filter(year(Date) %in% c(2024, 2025)). A two-way frequency table is also built with table(), comparing wind direction (WindkmhDir) with presence of precipitation, which is Yes if Precmm > 0 and No otherwise. The table is re-shaped into a data frame for proper display using kable().

Table Result: The table shows that precipitation is more frequent when the winds come from southern and western directions, particularly SW (23), SSW (21), and WNW (13). On the other hand, fewer days are experienced with precipitation by easterly directions (E, ENE, ESE). This indicates that there is a strong correlation between wind direction and rain in Colchester, which implies that winds coming from the southwest and west are most likely to be moisture and precipitation carriers for the year 2024–2025.

##Graph 1: This graph illustrates the top 5 most frequent wind directions of 2024–25, the top one being SW (25.8%), followed by WSW (22.6%), W (19.4%), SSW (18.8%), and WNW (13.4%). Data shows prevailing winds indicated by percentages, which can be applied in weather analysis, agriculture, or city planning. Percentages show that southwesterly winds dominate in this area.

##Graph 2: This is a plot of 2024 and 2025 range of 2024 maximum temperatures (°C) versus a span of 0°C to 30°C. Overlaid curves indicate peaks in frequency, while trends in temperature are clear. Denser points indicate common temperature ranges, which are helpful in establishing yearly variations, peculiar weather patterns, or climatic anomalies. A graphical trend analysis application.

##Graph 3: This violin plot plots sun duration (hours) distribution by different wind directions (e.g., S, SW, W). This paired boxplot and density curve plot shows median, spread, and frequency of sunlight hours per wind direction. Suspicious labels (e.g., “MHz,” “VISV”) are probably data errors or encoding issues. Suitable for weather-sunlight correlation research.

##Graph 4: This time series graphs pressure values (hPa) April 2024 to April 2025, using GAM smoothing to detect trends. Pressure ranges 980–1040 hPa, and changes are most likely due to weather patterns. Seasonal patterns can be seen using the smoothed curve, i.e., storms (dips) or high-pressure regions (peaks). Note: “MPa” is likely a unit mistake (should read hPa).

##Graph 5:

This dynamic scatter plot displays 2024 and 2025 max and min temperatures (°C), with points colored by year. The x-axis is min temps (0–10°C), and the y-axis is max temps (0–30°C). Users can likely hover/click for data. Reveals correlations (e.g., larger max temps with larger min temps) and year-to-year variability of temperature extremes.

REFERENCES:

1)https://moodle.essex.ac.uk/course/view.php?id=12150

2)https://moodle.essex.ac.uk/pluginfile.php/1770088/mod_resource/content/1/More_on_Advanced_graphics.pdf

3)Healy, K. (2019). Data Visualization: A Practical Introduction. Princeton University Press.